Goto

Collaborating Authors

 transfer attack


Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Neural Information Processing Systems

Adversarial examples crafted on one model often exhibit poor transferability to others, hindering their effectiveness in black-box settings. This limitation arises from two key factors: (i) \emph{decision-boundary variation} across models and (ii) \emph{representation drift} in feature space. We address these challenges through a new perspective that frames transferability for \emph{untargeted attacks} as a \emph{consensus-robust optimization} problem: adversarial perturbations should remain effective across a neighborhood of plausible target models. To model this uncertainty, we introduce two complementary perturbation channels: a \emph{parameter channel}, capturing boundary shifts via weight perturbations, and a \emph{representation channel}, addressing feature drift via stochastic blending of clean and adversarial activations. We then propose \emph{CORTA} (COnsensus--Robust Transfer Attack), a lightweight attack instantiated from this robust formulation using two first-order strategies: (i) sensitivity regularization based on the squared Frobenius norm of logits' Jacobian with respect to weights, and (ii) Monte Carlo sampling for blended feature representations. Our theoretical analysis provides a certified lower bound linking these approximations to the robust objective. Extensive experiments on CIFAR-100 and ImageNet show that CORTA significantly outperforms state-of-the-art transfer-based methods---including ensemble approaches---across CNN and Vision Transformer targets. Notably, CORTA achieves a \emph{19.1 percentage-point gain in transfer success rate over the best prior method} while using only a single surrogate model.



AppendixofSynergy-of-experts 1 TheoreticalProofs

Neural Information Processing Systems

From Figure 1(a), learning multiple linear sub-models and averaging the predictions (ensemble) is still a linear model, so it cannot tackleXOR problem. We compare the training cost of all methods from the two aspects;1). Thesub-model training enables themost adversarial attacks ofsub-models could be successfully defended. In particular, we train two kinds of models to defend against the attacks: 1). FromFigure2(a)and2(b),when0.01 ϵ 0.04, SoE without the collaboration training achieves a similar robustness compared with SoE.



AdversariallyRobust3DPointCloudRecognition UsingSelf-Supervisions SupplementaryMaterials

Neural Information Processing Systems

In this section, we introduce our implementation details of the adopted model architectures and self-supervisedlearningtasks. The EdgeConv layers are stacked to form the DGCNN backbone. As introduced in 2.2,we choose k = 3,4 in this task. We follow the attack setups in [13] to formulate our attack. Weprovide insights onhow different components contribute to the overall improvements.


3ad7c2ebb96fcba7cda0cf54a2e802f5-Paper.pdf

Neural Information Processing Systems

Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, andsuffersfrom significant lossonclean dataaccuracy.


Blurred-Dilated Method for Adversarial Attacks

Neural Information Processing Systems

Deep neural networks (DNNs) are vulnerable to adversarial attacks, which lead to incorrect predictions. In black-box settings, transfer attacks can be conveniently used to generate adversarial examples. However, such examples tend to overfit the specific architecture and feature representations of the source model, resulting in poor attack performance against other target models. To overcome this drawback, we propose a novel model modification-based transfer attack: Blurred-Dilated method (BD) in this paper. In summary, BD works by reducing downsampling while introducing BlurPool and dilated convolutions in the source model.


DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles

Neural Information Processing Systems

Recent research finds CNN models for image classification demonstrate overlapped adversarial vulnerabilities: adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, and suffers from significant loss on clean data accuracy. Alternatively, ensemble methods are proposed to induce sub-models with diverse outputs against a transfer adversarial example, making the ensemble robust against transfer attacks even if each sub-model is individually non-robust. Only small clean accuracy drop is observed in the process. However, previous ensemble training methods are not efficacious in inducing such diversity and thus ineffective on reaching robust ensemble. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features, and diversifies the adversarial vulnerability to induce diverse outputs against a transfer attack. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks comparing to previous ensemble methods, and enables the improved robustness when more sub-models are added to the ensemble.


Text Prompt Injection of Vision Language Models

arXiv.org Artificial Intelligence

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.